Improving Text representations through Probabilistic Integration of Synonymy Relations
نویسندگان
چکیده
The present contribution focuses on the integration of word senses in a vector representation of texts, using a probabilistic model. The vector representation under consideration is the DSIR model, that extends the standard Vector Space (VS) model by taking both occurrences and co-occurrences of words into account. Integration of word senses into the co-occurrence model is done using a Markov Random Field model with hidden variables, using semantic information derived from synonymy relations extracted from a synonym dictionary.
منابع مشابه
Readers’ Perceptions of Lexical Cohesion in Text
Preliminary results from an experimental study of readers’ perceptions of lexical cohesion and lexical semantic relations in text are presented. Readers agree on a common “core” of groups of related words and exhibit individual differences. The majority of relations reported are “non-classical” (not hyponymy, meronymy, synonymy, or antonymy). A group of commonly used relations is presented. The...
متن کاملImproving Probabilistic Latent Semantic Analysis with Principal Component Analysis
Probabilistic Latent Semantic Analysis (PLSA) models have been shown to provide a better model for capturing polysemy and synonymy than Latent Semantic Analysis (LSA). However, the parameters of a PLSA model are trained using the Expectation Maximization (EM) algorithm, and as a result, the trained model is dependent on the initialization values so that performance can be highly variable. In th...
متن کاملQuantifying the Impact and Extent of Undocumented Biomedical Synonymy
Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships....
متن کاملSparse Overcomplete Word Vector Representations
Current distributed representations of words show little resemblance to theories of lexical semantics. The former are dense and uninterpretable, the latter largely based on familiar, discrete classes (e.g., supersenses) and relations (e.g., synonymy and hypernymy). We propose methods that transform word vectors into sparse (and optionally binary) vectors. The resulting representations are more ...
متن کاملMental Representations of Lyrical Prose
The article analyzes mental representations of Russian lyrical prose texts. The texts demonstrate collective memory engrams that are defined by cultural and historical legacy of the nation and authors’ creative world perception. In architectonics of a lyrical prose text, sense perception reveals itself in accumulated underlying meanings and wisdom conveyed by expressive means. The author’s inte...
متن کامل